-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2D Convolution example using global and shared memory #2228
2D Convolution example using global and shared memory #2228
Conversation
9c27efd
to
aa3efcc
Compare
dc160f3
to
2b93807
Compare
80dad53
to
d16a616
Compare
bd30c0e
to
a05dabc
Compare
ae50779
to
a8ac8e6
Compare
// Allocate shared memory | ||
auto* const sharedN = alpaka::getDynSharedMem<TElem>(acc); | ||
// Fill shared memory of device so that tile items are accessed from shared memory | ||
if(row < matrixHeight && col < matrixWidth && blockThreadIdx1D < blockThreadExtent.prod()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the host side you use getValidWorkDiv
This means you will have one thread for some alpaka accelerators.
I know you wrote that the block size must be equal to the tile size but you do not enforce it e.g. with an ALPAKA_VERIFY
If you have only one thread in the block you can simply iterate over the shared memory to fill it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but if there is one thread per block it means it is not a GPU (or it is not good to use GPU); so we dont know which level of memory is used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is one block, whole block is loaded into the shared memory in the code.
91b7f51
to
4673202
Compare
4673202
to
06f9ace
Compare
An example: A 2D Convolutional filter applied to a matrix. The values of filter-matrix were initially kept in constant memory at the first commit. But due to Gitlab pipeline error "The SYCL backend does not support global device constants"; in the second commit, constant memory usage has been removed.